feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers by raullenchai · Pull Request #13 · raullenchai/Rapid-MLX

raullenchai · 2026-03-04T18:21:25Z

Summary

Port 3 upstream vLLM tool parsers for the most popular MLX models:
- seed_oss (seed_oss, seed, gpt_oss): GPT-OSS-20B XML format with <seed:tool_call> + <seed:think> thinking blocks
- deepseek_v31 (deepseek_v31, deepseek_r1_0528): DeepSeek V3.1/R1-0528 unicode special tokens (simpler than V3 — no code fence, no type prefix)
- qwen3_coder_xml (qwen3_coder_xml, qwen3_xml): Qwen3-Coder XML format with <tool_call>/<function=...> and parameter type conversion
Add 72 upstream regression tests across all 3 parsers + registration tests
Update eval configs: evals/README.md server flags table, evals/run_all_models.sh GPT-OSS parser minimax → seed_oss

Motivation

MLX download rankings show these are among the most popular models:

GPT-OSS-20B (fix: Streaming reasoning parser buffer phase eating content #8, 542K downloads) — was using minimax parser → only 17% tool calling score
DeepSeek-R1-0528 (fix: reset DeltaNet recurrent state on prompt cache trim #14-15, 681K) — was using V3 parser which has different token format
Qwen3-Coder-Next (fix: Prevent server crash from malformed response_format schemas #4-6, 2.7M) — hermes works (90%) but upstream has dedicated XML parser

Test plan

python3.12 -m pytest tests/test_upstream_regression.py -v — 72/72 pass
python3.12 -m pytest tests/test_tool_parsers.py tests/test_minimax_tool_parser.py -v — 143/143 pass (no regressions)
All parser aliases registered and discoverable
Manual test with GPT-OSS-20B using --tool-call-parser seed_oss (if server available)

🤖 Generated with Claude Code

Port 3 upstream vLLM tool parsers for popular MLX models: - seed_oss: GPT-OSS-20B XML format (<seed:tool_call> + <seed:think>) - deepseek_v31: DeepSeek V3.1/R1-0528 unicode special tokens - qwen3_coder_xml: Qwen3-Coder XML format (<tool_call>/<function=...>) Includes 72 upstream regression tests and eval config updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix GLM47 test_streaming_no_tool_calls to match current strip_think_tags behavior (strips leading whitespace from content deltas) - Add multi-step streaming tests for seed_oss and qwen3coder that verify header + { + params + } are all emitted across multiple calls - Add note that run_all_models.sh paths are machine-specific Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix GLM47 streaming: strip_think_tags was eating inter-word spaces on normal content deltas; now only strips when </think> is actually present - Add multi-step streaming tests for seed_oss and qwen3coder that verify complete tool call emission (header + { + params + }) with fine-grained deltas matching realistic token boundaries - Add note that run_all_models.sh paths are machine-specific Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Streaming completeness (seed_oss + qwen3coder): - When the function body is already complete at header-detection time, emit the full tool call (name + arguments) in one chunk instead of header-only. Prevents truncated output when coarse deltas or max_tokens leave no further parser calls. - When tool_call_start is detected, fall through to header parsing instead of returning None — the header may already be available. GLM47 streaming: - Only call strip_think_tags when </think> is actually present in the delta, preventing inter-word spaces from being eaten on normal content. Tests: - Add coarse-delta streaming tests that verify complete arguments are emitted even with a single large chunk (seed_oss + qwen3coder). - Fix GLM47 streaming test to expect preserved whitespace. Other: - Remove misleading MODEL_DIR env var reference from run_all_models.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Set HarmonyToolParser.SUPPORTS_NATIVE_TOOL_FORMAT = True so multi-turn tool history uses native harmony tokens instead of plain text conversion ("[Calling tool: ...]"), which broke GPT-OSS tool flow understanding. - Extend load_model_with_fallback to catch "Missing N parameters" errors (not just "parameters not in model") for VLM-packaged models like Qwen3.5-9B and Mistral-Small-3.2 that need strict=False. - Update harmony and native format tests accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add explicit parentheses in tokenizer.py fallback condition to clarify `or`/`and` precedence (behavior was correct but ambiguous to read). - Fix _convert_param_value() in seed_oss and qwen3coder parsers: when schema says "number"/"float", always return float instead of silently coercing 3.0 → int(3). Removes lossy `fv - int(fv) != 0` check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Your Name and others added 7 commits March 4, 2026 10:21

raullenchai merged commit 7ffce24 into main Mar 4, 2026

raullenchai deleted the feat/upstream-tool-parsers branch March 4, 2026 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13

feat: add seed_oss, deepseek_v31, qwen3_coder_xml tool parsers#13
raullenchai merged 7 commits intomainfrom
feat/upstream-tool-parsers

raullenchai commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented Mar 4, 2026

Summary

Motivation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant